Skip to content

Add array_dot_product / list_dot_product function#12476

Closed
austin362667 wants to merge 4 commits intoapache:mainfrom
austin362667:feat/array_dot_product
Closed

Add array_dot_product / list_dot_product function#12476
austin362667 wants to merge 4 commits intoapache:mainfrom
austin362667:feat/array_dot_product

Conversation

@austin362667
Copy link
Contributor

@austin362667 austin362667 commented Sep 15, 2024

Which issue does this PR close?

Closes #12475.

Rationale for this change

Add dot product functionality to DataFusion. It would be valuable to add scalar UDF array_dot_product / list_dot_product which computes inner product of two arrays, that is already supported by well-known DBs like DuckDB.

What changes are included in this PR?

  • Re-organize convert_to_f64_array to functions-nested/utils.rs.
  • Add array_dot_product / list_dot_product in functions-nested.
  • Add SLT in array.slt.
  • Update corresponding scalar UDF docs.

Are these changes tested?

Yes, added some array-specific SQL logic test, including List/LargeList/FixedSizedList

Are there any user-facing changes?

Yes, new function array_dot_product(arr1, arr2) is added.

For instance,

> CREATE TABLE word_embedding (
    emb_a DOUBLE[],
    emb_b DOUBLE[]
);
0 row(s) fetched.
Elapsed 0.008 seconds.

> INSERT INTO word_embedding VALUES
([1.0, 2.0, 3.0], [1.0, 2.0, 5.0]),
([2.0, 4.0, 6.0], [2.0, 4.0, 6.0]),
([1.5, 2.5, 3.5], [4.5, 6.5, 8.5]);
+-------+
| count |
+-------+
| 3     |
+-------+
1 row(s) fetched.
Elapsed 0.009 seconds.

> SELECT
    emb_a,
    emb_b,
    list_dot_product(emb_a, emb_b) AS inner_product
FROM
    word_embedding;
+-----------------+-----------------+---------------+
| emb_a           | emb_b           | inner_product |
+-----------------+-----------------+---------------+
| [1.0, 2.0, 3.0] | [1.0, 2.0, 5.0] | 20.0          |
| [2.0, 4.0, 6.0] | [2.0, 4.0, 6.0] | 56.0          |
| [1.5, 2.5, 3.5] | [4.5, 6.5, 8.5] | 52.75         |
+-----------------+-----------------+---------------+
3 row(s) fetched.
Elapsed 0.008 seconds.

Signed-off-by: Austin Liu <austin362667@gmail.com>
Signed-off-by: Austin Liu <austin362667@gmail.com>
Signed-off-by: Austin Liu <austin362667@gmail.com>
Signed-off-by: Austin Liu <austin362667@gmail.com>
@github-actions github-actions bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) labels Sep 15, 2024
@dharanad
Copy link
Contributor

Hey @austin362667 Maybe you should take a look at this discussion #12357 once

@austin362667
Copy link
Contributor Author

Thank you, @dharanad , for bringing this to my attention. This is a great discussion. I like the idea of keeping the DataFusion core as simple as possible while retaining useful DuckDB functions that enhance the user experience. I'm open to any feedback~

@alamb
Copy link
Contributor

alamb commented Sep 16, 2024

Thank you, @dharanad , for bringing this to my attention. This is a great discussion. I like the idea of keeping the DataFusion core as simple as possible while retaining useful DuckDB functions that enhance the user experience. I'm open to any feedback~

What would you think about creating a new crate in https://github.com/datafusion-contrib to hold additional duckdb functions? Perhaps https://github.com/datafusion-contrib/datafusion-functions-duckdb, similar to https://github.com/datafusion-contrib/datafusion-functions-json for JSON from @samuelcolvin and co.

It would be a pretty neat way to help build out the function library in DataFUsion

Also, @matthewmturner and I have been working on an integration UI similar to duckdb with many features -- https://github.com/datafusion-contrib/datafusion-dft -- we could then integrate these dft so it is easy to use

@austin362667
Copy link
Contributor Author

Sure, thank you Andrew for proposing this initiative.
I like the idea. Let's do it this way!!

@alamb
Copy link
Contributor

alamb commented Sep 17, 2024

Sure, thank you Andrew for proposing this initiative. I like the idea. Let's do it this way!!

Awesome -- let's try a new repo. Follow on discussion here: #12254 (comment)

@alamb alamb closed this Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support array_dot_product/list_dot_product

3 participants